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Abstract. 

A novel method for correlation analysis using scale-dependent Renyi entropies is described. The method 
involves calculating the entropy of a data distribution as an explicit function of the scale of a d-dimensional 
partition of d-cubes, which is dithered to remove bias. Analytic expressions for dithered scale-local entropy 
and dimension for a uniform random point set are derived and compared to Monte Carlo results. Simulated 
nontrivial point-set correlations representing condensation and clustering are similarly analyzed. 
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1. Introduction 

Entropy is a measure of the size of a data distribution contained within a bounded region (distribution 
support) of some space. In a thermodynamic context this distribution size is interpreted as the number of 
quantum states accessible to a dynamical system given macroscopic constraints. More generally, if a measure 
space is partitioned, the measure distribution size is estimated by the effective number of partition elements 
given the distribution weighting. 

Definition of the space partition is a central element of entropy calculation. The partition is sometimes 
defined as a small-scale limiting partition of the space {e.g., thermodynamic limit, limit procedures in classical 
analysis [3]), sometimes based on properties of the data distribution itself and/or on the analysis goals (as 
in wavelet analysis 

We define entropy as an explicit function of the partition definition, a scaled binning of the measure 
space. We calculate the entropy and related quantities as functions of the partition scale(s), similar to 
the multi-scale partition approaches used in fractal dimension calculations ^ and image deconvolution 
Unlike some analysis methods we invoke no scale limits. Quantities are defined on bounded scale intervals 
explictly excluding asymptotic limits; the analysis system is in this sense scale local. 

The end result of this approach is an entropy which represents arbitrary data correlations as a 
distribution on scale, particularly useful for problems where the detailed scaling behavior of correlations 
over substantial scale intervals is of interest [e.g., condensation, coalescence, critical phenomena, strange 
attractors), where instrumental effects may distort scale distributions, and where the correlation structure 
is not simply expressible as a power law or other elementary function. |S] 

In this paper we describe precision binning methods, define the basic scale-local entropy measure and 
generalize other aspects of information theory to define the scale-dependent entropy difference or information 
between an object distribution and a model reference as a differential correlation measure. Based on 
scale derivatives of entropy and information we define scale-local dimension and dimension transport as 
generalizations of conventional counterparts based on limit concepts. We apply these correlation measures 
to several simulations and real data analysis problems. 

2. Scale-local Entropy 

The entropy definition employed here is based on Cg(e), the rank-q correlation integral at scale e. Given 
a data distribution in a d-dimensional primary measure space spanned by variables {x.;}, we consider a set 
of corresponding correlation spaces containing all possible g-point clusters of data points (g-tuples). There 
is one such q-point distribution (in a g-fold Cartesian product space) for each unique q value. The g-point 
correlation integral is the projection of the g-point distribution onto its difference subspace spanned by 
{xi — Xj}, integrating over the sum variable(s). jH| The integration limit of the correlation integral on the 
difference variables is in the simplest case (isotropic binning) the single scale of the analysis. 

The reciprocal of the correlation integral estimates the effective bin number in the d{q — I)-dimensional 
difference subspace. It's counterpart in the primary measure space is the (g — 1)*'' root, the effective bin 
number in the primary space. Defining entropy as the logarithm of the effective primary-space bin number 
is consistent with entropy as a logarithmic size measure and is a generalization of the thermodynamic 
definition, the logarithm of the number of accessible states. The correlation integral can be approximated 
by binning the primary measure space, and expressed in terms of normalized bin contents Pi{e), in which 
case Cq{e) ~ Yl!i=i^ Pii^Y ■ This results in the rank-g Renyi entropy [7| 




M(e) 



(1) 
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The entropy Sq, the number of occupied bins M, and the bin probabiUty pi are exphcit functions of the 
binning scale e. More generally, a non-isotropic binning (one utilizing bins without unit aspect ratio) would 
imply a multidimensional scale space. 

3. Scale-local Information 

Given scale-local entropy we define scale-local information as a basis for differential comparisons between 
object and reference distributions, data and model. There are a number of possible information definitions 
from information theory and topology, with significant differences in performance. We define information 
here as the difference between entropies for object and reference distributions. This implies that the effective 
bin number for an object distribution is compared in ratio to that of a reference. Nonzero information implies 
multiplicative reduction of effective bin number by increased correlation structure in the object distribution 
relative to the reference (cumulant analysis is an alternative differential approach emphasizing linear or 
additive reduction of distribution size). Scale- local information is then defined as 



Information so defined provides a differential comparison between a data or object distribution and a model 
or reference. For example, the reference distribution for an arbitrary point set would be a distribution with 
the same number of points which maximally 'fills' the bounded support - a uniform random distribution. 
Since the uniform reference is useful in many applications we derive explicit analytic forms for its entropy 
and dimension. 

4. Binning and Dithering 

Although these scale-local analysis methods are completely general as to the nature of the measure 
distribution, for the purpose of illustration we emphasize point sets in this paper. The measure distribution 
is then a set of points in a d-dimensional embedding space {e.g., the distribution of particles from a heavy-ion 
collision in momentum space). The analysis begins by applying a partition to the embedding space. For 
algebraic simplicity we consider a grid of d-dimensional cubes, an isotropic binning. At each scale there is a 
continuum of possibilities for the relative position of the binning system on the embedding space. Differing 
partition placement effects the analysis in general, and there is no a priori reason to prefer any single 
placement. Thus, we average (dither) over all partition placements to calculate the entropy at each scale. 

We define a dithering phase for each of the d embedding-space dimensions. The relationship between 
the partition system and the measure distribution is controlled by varying (j). To implement dithering we 
calculate the correlation integral Pi{^Y of event J times at each scale, incrementing </> each time. 

Finally we average over these results to obtain the entropy. The dithered entropy Sq{e) is thus 



where pi^^{e) is the bin probability of the ith bin for binning phase (f> and scale e. pi is normalized so that 
^^Pi = 1 at each scale e and dithering phase 0, the sum taken over all occupied bins in each partition. 

5. A Simple Example 

A simple application of scale-local entropy and dithered binning illustrates the analysis process. We obtain 
the scaled entropy of a 2D uniform distribution of N randomly generated points on a unit-square support for 
several values of index q. A Monte Carlo simulation with 50k points is shown in figure ^ along with analysis 
results for q — 0,2, 5. To interpret these results we consider small-, intermediate-, and large-scale regimes. 



Iq{e) = Sq^rcf{e) - Sq^ohj{e). 



(2) 




(3) 
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Figure 1. Results of scale-local entropy analysis applied to a distribution of 50,000 random points uniformly 
distributed on a 2D unit square with L = 1. A box plot of the distribution itself is shown in the left panel, the 
right panel shows the measured entropy for q = (dashed line), q = 2 (dotted line), and q = b (dot-dashed 
line). 



In the small-scale limit, well below the point-separation scale (e <C L/vN), each distribution point occupies 
a single bin; there are N occupied bins {M — N), each with bin probability pi ~ 1/N. Thus, at small scale 
the entropy approaches 



lini5g(e) 



1-9 



log 
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log N. 



(4) 



At intermediate scales we idealize the uniform point set to a continuum {N — )■ oo). The number of occupied 
bins is then simply M = (i/e)^ and the bin probability is pi = 1/M giving Sq = 2 log(L/e). At scales 
substantially greater than the boundary scale (e 3> L) the entire distribution is contained in a single bin: 
AI = 1 and pi = 1. The rank-g entropy thus vanishes at scales much larger than the distribution boundary 
size. The detailed g-dependence near the particle-separation and boundary scales is amenable to an analytic 
treatment. 



6. Derivation of Sq{e) for a Bounded Uniform Distribution 

Information is a relative quantity, a matter of definition. It is impossible to make an absolute determination 
of the information content of an arbitrary distribution. Using a maximum-entropy reference we can measure 
information relative to a distribution which is minimally correlated (given certain constraints). This 
motivates us to derive an algebraic form for the scale-local entropy of a bounded uniform distribution (BUD). 
This distribution represents a maximum-entropy hypothesis within a boundary, a maximum filling of the 
distribution support. This is a correlation reference from which any object distribution may deviate with 
reduced entropy.| The derivation is presented in two parts: scales below and above the boundary scale. 

6.1. Below the boundary scale: e < L 

For partitions below the boundary scale there is at least one bin in the interior of the embedding space. 
To derive an analytic form for the entropy of a BUD we consider a two-dimensional distribution (and later 
generalize to d dimensions). The BUD is defined on a square support with side length L. Because the 

I Boundedness is itself a form of correlation in a self-consistent description. 
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distribution is uniform the probability of finding a point in any given bin is simply determined by the bin 
area. Figure [21 shows a BUD binned with a general rectangular binning (thin dark lines) . We calculate the 




Figure 2. A binning cartoon that illustrates the relationships among the binning, distribution support and 
dithering variables for the case of a two-dimensional embedding space. 

bin contents for each bin (as a function of scale) and integrate over all possible dithering configurations to 
determine the analytic form of the entropy. For bins that are entirely contained within the embedding space 
(interior bins) the bin probability is trivial: pi = e^Cyjl? is the area of the bin divided by the total area of 
the support, independent of bin dithering. For edge and corner bins the problem is more complicated. We 
consider each bin type - interior (white), edge (light grey) and corner (dark grey) - separately. 

Corner bins contain both an x- and y-axis support edge. For all dithering configurations there are four 
corner bins. The effective area of these corner bins is e^ey scaled down by the fraction of the bin along the 
ith axis (i € [l,rf]) that overlaps the data. By definition, the phase along an axis is 0^ = (1 — q^), where 
ai is the bin overlap fraction for the first bin on axis i {(j) E [0, 1]). We define = (L/ei) — mt{L/ei) and 
express the amount of overlap between the last bin on axis i and the edge of the support along that axis as 
= + Ai — int((/)i + Ai). With these definitions, calculating the contribution to the correlation integral 
of the corner bins is a matter of integrating over all (f> values using the relevant bin probabilities. Labeling 
the corner bins from right to left starting with the upper left bin we write down the 0-dependent corner bin 
probabilities as 

pl = (f-0.)ni-^y)^(^)' (5) 

= (1 - 0.)n0y + Ay - int(0y + Ay)]' 
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/)x + Ax - int(0x + ^^)W4>y + Ay - int((i)y + Ay)]'' 



L2 



We calculate the dither-averaged correlation integral Cq =< J2iPl ><t> by integrating over the different 
dithering configurations for each bin and summing < X^jP^ ><t>— X)i < Pi ><t>- Following this approach 
we integrate the expressions over the two dithering variables and sum results to calculate the correlation 
integral. The (p integral over the first corner bin yields 



<Pa >4 



The second term is 



<Ph>4> = 



f /V - <^x)'[<^y + Ay- int(</.y + A,)]«d,^xd<^y 



(6) 



(7) 
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The third and fourth terms are similar to the first and second; each of the four corner bins contributes an 
(exeyi^2)<?^l -|- g)^2 to the total ditlicr-avcragcd correlation integral. 

The contributions from edge bins are simpler to calculate. The overlap fraction is unity in the direction 
parallel to the support edge; along that axis we merely count the number of edge bins. The integral along the 
second axis is similar to the corner bins. Again there are four terms, but symmetry simplifies the problem 
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which is the expression for the x-axis border bins. Contributions from the y-axis bins are obtained by 
switching indices. The contribution from edge bins is thus 



/ exBy \ ^ 
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There remains the integral over interior bins, a simple matter of bin counting 



The full correlation integral can be assembled from the corner, edge, and interior bin integrals 
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Inserting this result into the definition of the rank-g entropy we find that 
5,(e) = ^log[C,(e) 



9 
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Because of the additivity of entropy under product this entropy expression contains two terms, one for 
each axis, which suggests the d-dimensional generalization: adding an equivalent term for each additional 
dimension. 



6.2. Above the boundary scale: e > L 

For the derivation of Eq. H12|) we considered contributions from three types of bins: corner, border, and 
interior. In the 2D derivation this approach is only valid when both e^ < L and Cy < L. If either of the 
scales is larger than the support size there are no interior bins and Eq. H12|l is not valid. Thus, to obtain the 
entropy expression for a BUD at large scale consider a single axis (we now exploit the additivity of entropy 
with respect to dimension for uncorrelated systems) with e > L. When there is a bin edge in the support 
there are exactly two corner bins; we express the bin probability of the second bin in terms of the first 

9 



Pi- 
Pi 



(z) 



(13) 



[1-Pi]'. 

When the support fits completely inside a single bin pq = 1] the distance from the support edge to the 
nearest bin edge is larger than the size of the support (ae > L). We now evaluate the relevant integrals 



<pI >a= fwda^l-- 



<pi > 



1 



(14) 
(15) 



Since the definition of corner bins is arbitrary, we could relabel and get the same result; symmetry requires 
that < pI >a=< pI >a- These results can now be assembled to calculate the ID scaled entropy for scales 
larger than the support size 
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Combining results, we obtain the rank-g scaled entropy for each degree of freedom of a BUD over all 
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(17) 



We have generated the exact expression for the scale-local entropy in the general case of a d-dimensional 
bounded uniform distribution. 
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7. Point Sets as BUDs 



Eq. 1)17(1 was derived in the continuum limit, it does not describe a uniform random point set. We generalize 
to the reference for discrete distributions by introducing an additional factor in Eq. ((17|l . The entropy of 
a bounded random uniform distribution (BRUD) follows the BUD reference at larger scales but eventually 
approaches \og{N) as a limiting value for scales smaller than the typical two-point separation. The BUD 
reference can be generalized to the discrete case if we account for the appearance of void bins at smaller scales. 
The fraction of void bins at scale e for a Poisson-distributed point set is exp[— /^o(6)]j where fJ.o{e) = Ne/L 
is the average bin occupancy for all bins within the boundary. The fraction of occupied bins (the support of 
the point set) is then /(e) = 1 — exp(— A^e/i). Incorporating /(e) into the BUD entropy yields the BRUD 
result 
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Figure 3. Scale-local entropy and information for a 2D uniform random distribution of 50k points {q = 
dashed, q = 2 dotted, q = 5 dot-dashed lines) with analytic BRUD entropy for q = (solid line) as a reference. 
Data and reference entropy distributions differ near the mean interparticle spacing scale {log{V N~^) '~-2.35) 
as discussed in the text. 



This entropy is precise for a discrete, random point set and q — 0. For q > the reference entropy differs 
from the entropy for real point sets in a small scale interval near e ~ L/^/N, the typical two-point separation. 
This information (entropy difference) derives from the fact that the form of Renyi entropy employed here is 
based on ordinary moments (averaged powers of pi{e)), whereas the point-set data are Poisson distributed. 
A correlation integral based on factorial moments would give zero information for uniform (uncorrelated) 
point sets 

We could conclude that a factorial-moment approximation to the correlation integral should be used for 
point sets at least, if not other applications. However, extending our previous observation that boundedness 
itself is a form of correlation in a self-consistent system we can also view the point set as a result of 
increasing correlation (coalescence) of a continuous distribution at a characteristic scale. The point set does 
have correlations additional to the boundedness of the continuous BUD. The apparent discrepancies in the 
entropies of BRUD and real points sets (the nonzero Iq in Fig. O reveal a genuine correlation feature in a 
discrete point set compared to a continuum. The Renyi entropies based on ordinary moments are preferred 
as a more general formulation applicable to arbitrary measure distributions. The information corresponding 
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to the difference between factorial moments and ordinary moments for the uniform random point set is 
meaningful and can be expressed analytically (the subject of a future paper). 



8. Scale-local Dimension 



Having obtained entropy and information as scale-local distributions we can similarly express the dimension 
of a distribution as a function of scale. We start with a conventional definition of dimension based on 
asymptotic limits ^ 
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(19) 



In this approach dimension is a single number defined in the limit of a zero-scale partition. The slope of 
the Sq{e) distribution is evaluated in the limit of zero scale. This definition favors certain specific types of 
correlation (power laws), and assumes that the limit exists, which may not be true even in principle. For 
more general cases the results can be misleading. We relax the asymptotic limit restriction as we did with 
scale-local entropy to obtain a more general scale-local dimension 
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Applying this definition to the generalized entropy of a BRUD in Eq. (|18|) yields 
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Figure 4. Entropy and dimension plotted as a function of partition scale for a randomly generated uniform 
distribution (g = dashed, q = 2 dotted, g = 5 dot-dashed lines). The solid lines shows the analytical g = 
results derived for a generalized BRUD. 



Dimension expressed as a scale- local distribution is a novel aspect of this entropy treatment. To 
understand how scale can affect the inferred dimension of an object consider the apparent dimension of 
a planet in the solar system from different viewpoints. For an observer on one planet other planets appear to 
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the eye as isolated points (with zero dimension) . With a powerful telescope the resolution size (scale) of the 
observation decreases substantially. Planets appear as 2D disks with ID border. A radar probe orbiting a 
planet can determine that the planet at smaller scale has a rich 3D surface and internal structure that could 
not be supported by a ID point or 2D plane. Continuing to the atomic scale planets are made of atoms 
and molecules that appear point-like. At this scale a planet's dimensionality returns to zero. This general 
principle is illustrated in figure^ 



9. Analysis of Hierarchically Organized Point Sets 

As an exercise in precision correlation analysis using scale-local entropy and dimension we model cluster 
formation via condensation by generating a hierarchical point distribution with correlation features 
distributed over a range of scales. This model is relevant to phase transitions and complex systems analysis. 
To create a two-dimensional, two-tier cluster hierarchy on a square region of side L we generate a uniform 
random distribution of A^o cluster sites, providing correlations at the characteristic length scale L/^/I7o - 
the mean site separation. At each cluster site we throw a randomly generated uniform distribution of A^i 
points with width Si, giving the distribution a second characteristic length scale Si/^/Ni. 
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Figure 5. Analysis results for a two-tier hierarchy with A'^o = 100, Ni = 100 & (5i = 0.001. The scaled 
dimension of the data (dashed line) is compared to the analytic BRUD reference with the corresponding 
number of points (A'^ = NqNi = 10, 000) at scale e/L = 1 (black solid line) as well as the analytic reference 
for N = No = Ni = 100 at scales 1 and 0.001 (gray solid lines). 

If the two tiers of the hierarchy are sufficiently separated on scale, as in Fig. |S1 the sub-structure of 
the clusters is not evident to the analysis at large scale {e ^ 6i) . The distribution appears to be a BRUD 
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of Nq random points. At smaller scale (e 5i) the apparent structure is dominated instead by the internal 
cluster structure, a BRUD of Ni points. This two-tiered hierarchical distribution is a first approximation to 
a self-similar distribution: it appears as a BRUD at scales L and 5x simultaneously (see Fig. [SJ- Extending 
the hierarchy by recursive self-similar cluster generation would converge to the limiting case of a fractal point 
distribution over an arbitrarily large scale interval {e.g., Cantor set). The 'fractal' dimension would depend 
on the relations among the hierarchy scale separation, the cluster size and the point count. 

The lower right panel of figure El shows the dimension transport for the two-tier hierarchy. Dimension 
transport is defined as the scale derivative of information, Ac?g(e) = — 9[/g(e)]/9[log(e)], and is a measure of 
the scale-dependent dimension difference between reference and object distributions. Dimension transport 
measures increasing correlation as the transport of dimensionality from larger to smaller scale. In the example 
of the two-tier hierarchy correlation (relative to the BRUD of the same multiplicity) anticorrelation of points 
at larger scale is achieved when points condense toward the cluster sites. The anti-correlation at larger scale 
results in a reduction of larger-scale dimensionality (the system appears more point-like) and an increase in 
the local point density and dimensionality at smaller scale. 
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Figure 6. A picture of the gradual onset of cluster formation {5\ = 0.03) in a system with a fixed multiplicity 
(A'^ = 10,000). Data are compared to a reference BRUD with equal multiplicity. 



The extended two-tier condensation example in figure El shows how small-scale correlations increase by 
condensing points of a BRUD onto cluster sites. At the onset of cluster formation (~3000 cluster sites, ~3 
points per cluster) the transport of dimension to smaller scale is barely visible (but still non-statistical). 
When the size of the clusters becomes significant (~1000 cluster sites, ~10 points per cluster) the analysis 
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indicates what the eye perceives directly, that the distribution of points is in some way correlated. When 
the cluster size is 10% of the number of clusters (~300 cluster sites, ~30 points per cluster) the dimension 
transport shows quite dramatically and quantitatively the transport of dimension from larger to smaller 
scale. 

10. Summary and Conclusions 

We have developed a novel analysis system which is well-suited to the task of precision correlation analysis 
for general measure distributions and especially for systems which exhibit clustering or other self-similar 
behavior. By extending the Renyi-entropy concept to a locally-defined function of scale we are able 
to establish a more complete picture of data correlations and make precision comparisons among data, 
simulations and model distributions in the context of information theory. Comparison of Monte Carlo results 
and analytic distributions have led to a detailed understanding of scale-local entropy measures. Analysis 
of simulated clustering data suggests the power of this method in the quantification of scale-dependent 
correlation structure. 
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